This is an evaluation of forecasts of Covid-19 case and death numbers submitted by the Epiforecasts team to the European COVID-19 Forecast Hub (European Forecast Hub Github page). These include the crowd forecasts made through the crowdforecastr app as well as a model created using EpiNow2.

This report is intended as a basic evaluation of predictions that helps forecasters to better understand their performance. The structure and visualisations are likely subject to change in the future and we cannot rule out any mistakes. You can learn more and provide feedback by creating an issue on our github repository



Forecaster ranking

Here is an overall ranking of all forecasters. The ranking is made according to relative skill. Relative skill is calculated by looking at all pairwise comparisons between forecasters in terms of the weighted interval score (WIS). See below for a more detailed explanation of the scoring metrics used. ‘Overall’ shows the complete ranking, ‘latest’ only spans the last 5-6 weeks of data. ‘Detailed’ represents the full data set that you can download for your own analysis.

overall

latest

Cases

Deaths

overall by horizon

Detailed



Evaluation metrics

  • Relative skill is a metric based on the weighted interval score (WIS) that is using a ‘pairwise comparison tournament’. All pairs of forecasters are compared against each other in terms of the weighted interval score. The mean score of both models based on the set of common targets for which both models have made a prediction are calculated to obtain mean score ratios. The relative skill is the geometric mean of these mean score ratios. Smaller values are better and a value smaller than one means that the model beats the average forecasting model.
  • The weighted interval score is a proper scoring rule (meaning you can’t cheat it) suited to scoring forecasts in an interval format. It has three components: sharpness, underprediction and overprediction. Sharpness is the width of your prediction interval. Over- and underprediction only come into play if the prediction interval does not cover the true value. They are the absolute value of the difference between the upper or lower bound of your prediction interval (depending on whether your forecast is too high or too low).
  • coverage deviation is the average difference between nominal and empirical interval coverage. Say your 50 percent prediction interval covers only 20 percent of all true values, then your coverage deviation is 0.5 - 0.2 = -0.3. The coverage deviation value in the table is calculated by averaging over the coverage deviation calculated for all possible prediction intervals. If the value is negative you have covered less then you should. If it is positve, then your forecasts could be a little more confident.
  • bias is a measure between -1 and 1 that expresses your tendency to underpredict (-1) or overpredict (1). In contrast to the over- and underprediction components of the WIS it is bound between -1 and 1 and cannot go to infinity. It is therefore less susceptible to outliers.
  • aem is the absolute error of your median forecasts. A high aem means your median forecasts tend to be far away from the true values.



Forecast visualisation

This is a visualisation of all forecasts made so far.

2021-03-22

Austria

inc case

inc death

Belgium

inc case

inc death

Bulgaria

inc case

inc death

Switzerland

inc case

inc death

Cyprus

inc case

inc death

Czechia

inc case

inc death

Germany

inc case

inc death

Denmark

inc case

inc death

Estonia

inc case

inc death

Spain

inc case

inc death

Finland

inc case

inc death

France

inc case

inc death

United Kingdom

inc case

inc death

Greece

inc case

inc death

Croatia

inc case

inc death

Hungary

inc case

inc death

Ireland

inc case

inc death

Iceland

inc case

inc death

Italy

inc case

inc death

Liechtenstein

inc case

inc death

Lithuania

inc case

inc death

Luxembourg

inc case

inc death

Latvia

inc case

inc death

Malta

inc case

inc death

Netherlands

inc case

inc death

Norway

inc case

inc death

Poland

inc case

inc death

Portugal

inc case

inc death

Romania

inc case

inc death

Sweden

inc case

inc death

Slovenia

inc case

inc death

Slovakia

inc case

inc death

2021-03-15

Austria

inc case

inc death

Belgium

inc case

inc death

Bulgaria

inc case

inc death

Switzerland

inc case

inc death

Cyprus

inc case

inc death

Czechia

inc case

inc death

Germany

inc case

inc death

Denmark

inc case

inc death

Estonia

inc case

inc death

Spain

inc case

inc death

Finland

inc case

inc death

France

inc case

inc death

United Kingdom

inc case

inc death

Greece

inc case

inc death

Croatia

inc case

inc death

Hungary

inc case

inc death

Ireland

inc case

inc death

Iceland

inc case

inc death

Italy

inc case

inc death

Liechtenstein

inc case

inc death

Lithuania

inc case

inc death

Luxembourg

inc case

inc death

Latvia

inc case

inc death

Malta

inc case

inc death

Netherlands

inc case

inc death

Norway

inc case

inc death

Poland

inc case

inc death

Portugal

inc case

inc death

Romania

inc case

inc death

Sweden

inc case

inc death

Slovenia

inc case

inc death

Slovakia

inc case

inc death

2021-03-08

Austria

inc case

inc death

Belgium

inc case

inc death

Bulgaria

inc case

inc death

Switzerland

inc case

inc death

Cyprus

inc case

inc death

Czechia

inc case

inc death

Germany

inc case

inc death

Denmark

inc case

inc death

Estonia

inc case

inc death

Spain

inc case

inc death

Finland

inc case

inc death

France

inc case

inc death

United Kingdom

inc case

inc death

Greece

inc case

inc death

Croatia

inc case

inc death

Hungary

inc case

inc death

Ireland

inc case

inc death

Iceland

inc case

inc death

Italy

inc case

inc death

Liechtenstein

inc case

inc death

Lithuania

inc case

inc death

Luxembourg

inc case

inc death

Latvia

inc case

inc death

Malta

inc case

inc death

Netherlands

inc case

inc death

Norway

inc case

inc death

Poland

inc case

inc death

Portugal

inc case

inc death

Romania

inc case

inc death

Sweden

inc case

inc death

Slovenia

inc case

inc death

Slovakia

inc case

inc death



Scores over time

Here you can see a visualisation of forecaster scores together next to the true observed values:

Austria

Weighted interval score

Overprediction

Underprediction

Sharpness

Belgium

Weighted interval score

Overprediction

Underprediction

Sharpness

Bulgaria

Weighted interval score

Overprediction

Underprediction

Sharpness

Switzerland

Weighted interval score

Overprediction

Underprediction

Sharpness

Cyprus

Weighted interval score

Overprediction

Underprediction

Sharpness

Czechia

Weighted interval score

Overprediction

Underprediction

Sharpness

Germany

Weighted interval score

Overprediction

Underprediction

Sharpness

Denmark

Weighted interval score

Overprediction

Underprediction

Sharpness

Estonia

Weighted interval score

Overprediction

Underprediction

Sharpness

Spain

Weighted interval score

Overprediction

Underprediction

Sharpness

Finland

Weighted interval score

Overprediction

Underprediction

Sharpness

France

Weighted interval score

Overprediction

Underprediction

Sharpness

United Kingdom

Weighted interval score

Overprediction

Underprediction

Sharpness

Greece

Weighted interval score

Overprediction

Underprediction

Sharpness

Croatia

Weighted interval score

Overprediction

Underprediction

Sharpness

Hungary

Weighted interval score

Overprediction

Underprediction

Sharpness

Ireland

Weighted interval score

Overprediction

Underprediction

Sharpness

Iceland

Weighted interval score

Overprediction

Underprediction

Sharpness

Italy

Weighted interval score

Overprediction

Underprediction

Sharpness

Liechtenstein

Weighted interval score

Overprediction

Underprediction

Sharpness

Lithuania

Weighted interval score

Overprediction

Underprediction

Sharpness

Luxembourg

Weighted interval score

Overprediction

Underprediction

Sharpness

Latvia

Weighted interval score

Overprediction

Underprediction

Sharpness

Malta

Weighted interval score

Overprediction

Underprediction

Sharpness

Netherlands

Weighted interval score

Overprediction

Underprediction

Sharpness

Norway

Weighted interval score

Overprediction

Underprediction

Sharpness

Poland

Weighted interval score

Overprediction

Underprediction

Sharpness

Portugal

Weighted interval score

Overprediction

Underprediction

Sharpness

Romania

Weighted interval score

Overprediction

Underprediction

Sharpness

Sweden

Weighted interval score

Overprediction

Underprediction

Sharpness

Slovenia

Weighted interval score

Overprediction

Underprediction

Sharpness

Slovakia

Weighted interval score

Overprediction

Underprediction

Sharpness



Ranks over time

This table shows you either your rank among all forecasters or the standardised rank. The standardised rank is computed as (100 - the forecaster percentile rank) among all forecasters for a given target and forecast date. What happens is basically this: Every forecaster gets assigned a rank (1 is the best and the worst equals the number of available forecasts for that date). This rank is then transformed to a scale from 1 to 100 such that 100 is best and 0 is worst. Ranks are determined based on the weighted interval scores.

Austria

model rank

standardised model rank

Belgium

model rank

standardised model rank

Bulgaria

model rank

standardised model rank

Switzerland

model rank

standardised model rank

Cyprus

model rank

standardised model rank

Czechia

model rank

standardised model rank

Germany

model rank

standardised model rank

Denmark

model rank

standardised model rank

Estonia

model rank

standardised model rank

Spain

model rank

standardised model rank

Finland

model rank

standardised model rank

France

model rank

standardised model rank

United Kingdom

model rank

standardised model rank

Greece

model rank

standardised model rank

Croatia

model rank

standardised model rank

Hungary

model rank

standardised model rank

Ireland

model rank

standardised model rank

Iceland

model rank

standardised model rank

Italy

model rank

standardised model rank

Liechtenstein

model rank

standardised model rank

Lithuania

model rank

standardised model rank

Luxembourg

model rank

standardised model rank

Latvia

model rank

standardised model rank

Malta

model rank

standardised model rank

Netherlands

model rank

standardised model rank

Norway

model rank

standardised model rank

Poland

model rank

standardised model rank

Portugal

model rank

standardised model rank

Romania

model rank

standardised model rank

Sweden

model rank

standardised model rank

Slovenia

model rank

standardised model rank

Slovakia

model rank

standardised model rank



WIS decomposition

The weighted interval score can be decomposed into three parts: sharpness (the amount of uncertainty around the forecast), overprediction and underprediction. This visualisation gives an impression of the distribution between these three forms of penalties for the different forecasters.

overall

Austria

Belgium

Bulgaria

Switzerland

Cyprus

Czechia

Germany

Denmark

Estonia

Spain

Finland

France

United Kingdom

Greece

Croatia

Hungary

Ireland

Iceland

Italy

Liechtenstein

Lithuania

Luxembourg

Latvia

Malta

Netherlands

Norway

Poland

Portugal

Romania

Sweden

Slovenia

Slovakia



Models and available forecasts

The following graphic gives an overview of the forecasters and models analysed and the number of forecasts they contributed.

Most of the ‘models’ are human forecasters, but some are not:

  • EpiExpert-ensemble is the ensemble that is formed as the mean of all human forecasts submitted
  • EpiNow2 is an exponential growth model that uses a time-varying Rt trajectory to predict latent infections, and then convolves these infections with estimated delays to observations, via a negative binomial model coupled with a day of the week effect. It makes limited assumptions and is not tuned to the specifities of Covid in Germany and Poland beyond epidemioligical details such as literature estimates of the generation time, incubation period and the population of each area. The method and underlying theory are under active development with more details available here.